Skip to content

ci(simd): Phase 6 — AVX-512 dispatch check job#175

Merged
AdaWorldAPI merged 2 commits into
masterfrom
claude/pr-x-phase6-avx512-ci
May 20, 2026
Merged

ci(simd): Phase 6 — AVX-512 dispatch check job#175
AdaWorldAPI merged 2 commits into
masterfrom
claude/pr-x-phase6-avx512-ci

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Phase 6 of the integration plan in .claude/knowledge/simd-dispatch-architecture.md.

Adds a new tier4-avx512-check CI job that compiles the crate with -Ctarget-cpu=x86-64-v4 so the #[cfg(target_feature = "avx512f")] dispatch arm in src/simd.rs is exercised on every PR. Without this the AVX-512 code path bit-rots under the v3 default — it only compiles when a developer happens to build locally with --config .cargo/config-avx512.toml.

Implementation notes

  • cargo check instead of cargo test/cargo build. GH-hosted ubuntu-latest runners have intermittent AVX-512 silicon across VM SKUs (Azure D-series mix); a v4-baked binary would SIGILL on a non-AVX-512 host. check compiles through type/borrow/monomorphization without producing a runnable artifact — catches the dispatch-arm type mismatches that motivated this PR series in the first place (PR PR-X12 A1: CTU carrier + quad-tree partition #170 CI failure mode).
  • Job-level env: RUSTFLAGS: "-D warnings -Ctarget-cpu=x86-64-v4" overrides the global RUSTFLAGS="-D warnings" set at the top of ci.yaml. Without the override, .cargo/config-avx512.toml's rustflags would be ignored — env wins over config file in cargo's precedence (the same issue that broke PR feat(simd): Phase 1 — explicit cargo configs + AVX2 dispatch hardening #172 with v3 in config.toml).
  • Two check passes: default features + hpc-extras. The latter pulls the p64/fractal dep tree which exercises a different slice of the AVX-512 codepaths (BF16 RNE, AMX byte-asm). Each ~30 s with cache hit, ~3 min cold.
  • Added to conclusion job's needs list so a v4 check failure blocks merge.

Test plan

  • Watch this job pass on this PR's CI run — validates the v4 dispatch arm compiles clean.
  • If it fails, that's the diagnosis surface from PR PR-X12 A1: CTU carrier + quad-tree partition #170 — likely a type that's __m512-shaped in simd_avx512.rs but referenced through the v3 dispatch arm. Fix in a follow-up.

Generated by Claude Code

Phase 6 of the integration plan in `.claude/knowledge/
simd-dispatch-architecture.md`.

Adds a new `tier4-avx512-check` job that compiles the crate with
`-Ctarget-cpu=x86-64-v4` so the `#[cfg(target_feature = "avx512f")]`
dispatch arm in `src/simd.rs` is exercised on every PR. Without this
the AVX-512 code path bit-rots under the v3 default (`x86-64-v3` baseline
in `.cargo/config.toml`) — it compiles only when a developer happens to
build locally with `--config .cargo/config-avx512.toml`.

Implementation notes
--------------------

* `cargo check` instead of `cargo test`/`cargo build`. GH-hosted
  `ubuntu-latest` runners have intermittent AVX-512 silicon across VM
  SKUs (Azure D-series mix); a v4-baked binary would SIGILL on a
  non-AVX-512 host. `check` compiles through type/borrow/monomorphization
  without producing a runnable artifact — catches the dispatch-arm
  type mismatches that motivated this PR series in the first place
  (PR #170 CI failure mode).

* Job-level `env: RUSTFLAGS: "-D warnings -Ctarget-cpu=x86-64-v4"`
  overrides the global `RUSTFLAGS="-D warnings"` set at the top of
  `ci.yaml`. Without the override, `.cargo/config-avx512.toml`'s
  rustflags would be ignored — env wins over config file in cargo's
  precedence (the same issue that broke PR #172 with the v3 setting
  in config.toml).

* Two check passes: default features + `hpc-extras`. The latter pulls
  the p64/fractal dep tree which exercises a different slice of the
  AVX-512 codepaths (BF16 RNE, AMX byte-asm). Each runs ~30 s with
  Swatinem cache hit, ~3 min cold.

* Added to the `conclusion` job's `needs` list so a v4 check failure
  blocks merge.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45b1bfe8ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/ci.yaml Outdated
Comment on lines +201 to +202
- name: cargo check (v4 / AVX-512 dispatch arm)
run: cargo check -p ndarray --features approx,serde,rayon
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Split AVX-512 checks into distinct feature sets

Make the first cargo check truly non-hpc-extras; as written, cargo check -p ndarray --features approx,serde,rayon still enables default features, and Cargo.toml sets default = ["std", "hpc-extras"], so this pass already includes hpc-extras. That makes the second pass (--features ...,hpc-extras) effectively redundant, adding CI time without additional coverage and failing to validate the intended “without hpc-extras” configuration.

Useful? React with 👍 / 👎.

The previous iteration of tier4-avx512-check set `RUSTFLAGS=
"-Ctarget-cpu=x86-64-v4"` as a job-level env. That env applies to BOTH
the target compilation AND host build scripts (`build.rs` artifacts
cargo runs natively). On a GH-hosted runner without AVX-512 silicon,
those v4-baked build scripts SIGILL during dep compilation — the job
exited in 23 s before our own crate even started compiling.

Fix: use `CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS` (the env form
that's documented to apply only when cargo produces artifacts for that
triple, NOT to host build scripts) plus explicit `--target=x86_64-
unknown-linux-gnu` so cargo distinguishes host from target even when
they share the triple. Result: v4 reaches our crate, baseline reaches
build scripts.

Cargo doc reference: https://doc.rust-lang.org/cargo/reference/config.html
#target<triple>rustflags — "These flags only apply to the final
artifact, and won't affect dependencies."
@AdaWorldAPI AdaWorldAPI merged commit 3c20392 into master May 20, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants